symbolic graph
Reasoning Models Reason Well, Until They Don't
Rameshkumar, Revanth, Huang, Jimson, Sun, Yunxin, Xia, Fei, Saparov, Abulhair
Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest complexity. We revisit these findings through the lens of large reasoning models (LRMs) -- LLMs fine-tuned with incentives for step-by-step argumentation and self-verification. LRM performance on graph and reasoning benchmarks such as NLGraph seem extraordinary, with some even claiming they are capable of generalized reasoning and innovation in reasoning-intensive fields such as mathematics, physics, medicine, and law. However, by more carefully scaling the complexity of reasoning problems, we show existing benchmarks actually have limited complexity. We develop a new dataset, the Deep Reasoning Dataset (DeepRD), along with a generative process for producing unlimited examples of scalable complexity. We use this dataset to evaluate model performance on graph connectivity and natural language proof planning. We find that the performance of LRMs drop abruptly at sufficient complexity and do not generalize. We also relate our LRM results to the distributions of the complexities of large, real-world knowledge graphs, interaction graphs, and proof datasets. We find the majority of real-world examples fall inside the LRMs' success regime, yet the long tails expose substantial failure potential. Our analysis highlights the near-term utility of LRMs while underscoring the need for new methods that generalize beyond the complexity of examples in the training distribution.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (9 more...)
Supplementary Material for Paper " Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs " A Criteria for Node Equality When Merging Traces
TraceGraph, it compares the type, attributes, and the executed location of each operation. For example, the MatMul operation of TensorFlow has ' MatMul ' as GraphGenerator fails to match because of the different attributes. The pushed call id is popped when the function is returned. As same as the call id stack, Terra manages the loop id stack for the entire program execution. Current implementation of Terra does not consider multi-threading yet.
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > United States > California > Alameda County > Berkeley (0.04)
Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs
The rapid evolution of deep neural networks (DNNs) has been fueled by the support of deep learning (DL) frameworks like TensorFlow and PyTorch. DL frameworks allow users to build and execute DNNs through Python programming. The standard execution model in DL frameworks is imperative execution: the Python Interpreter executes a DL program just as it treats a regular Python program. Let us go over a simple DL program to grasp the concept. Here, we assume that the condition the Interpreter first evaluates is True.
Theano Tutorial - Marek Rei
This is an introductory tutorial on using Theano, the Python library. I'm going to start from scratch and assume no previous knowledge of Theano. However, understanding how neural networks work will be useful when getting to the code examples towards the end. I recently gave this tutorial as a talk in University of Cambridge and it turned out to be way more popular than expected. In order to give more people access to the material, I'm now writing it up as a blog post. I do not claim to know everything about Theano, and I constantly learn new things myself.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.24)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Quebec > Montreal (0.04)